Gender Inference for Arabic Language in Social Media
نویسندگان
چکیده
The widespread usage of social media has attracted a new group of researchers seeking information on who, what and, where the users are. Some of the information retrieval researchers are interested in identifying the gender, age group, and the educational level of the users. The objective of this work is to identify the gender in the Arabic posts in the social media. Most of the works related to gender classification has been for English based content in the social media. Work for other languages, such as Arabic, is almost next to none. Typically people express themselves in the social media using colloquial, so this study is geared towards the identification of genders using the Saudi dialect of the Arabic language. To solve the gender identification problem the authors, a novel method called k-Top Vector (k-TV), which is based on the k-top words based on the words occurrences and the frequency of the stems, was introduced. Part of this work required compiling a dataset of Saudi dialect words. For this, a well-known widely used social site was relied on. To test the system, we compiled 1200 samples equally split between both genders. The authors trained Support Vector Machine (SVM) and k-NN classifiers using different number of samples for training and testing. SVM did a better job and achieved an accuracy of 95% for gender classification. Gender Inference for Arabic Language in Social Media
منابع مشابه
Using Machine Learning Algorithms for Automatic Cyber Bullying Detection in Arabic Social Media
Social media allows people interact to express their thoughts or feelings about different subjects. However, some of users may write offensive twits to other via social media which known as cyber bullying. Successful prevention depends on automatically detecting malicious messages. Automatic detection of bullying in the text of social media by analyzing the text "twits" via one of the machine l...
متن کاملAuthor Profiling for Arabic Tweets based on n-grams
This paper presents an approach for author profiling of an unknown users from their texts produced in social media. In particular, we address the identification of two profile dimensions: gender and language variety, of Arabic twitter users based on their tweets. Our approach focused on applying metaclassification technique on features extracted from tweets body. We explored two main sets of fe...
متن کاملGender Inference of Twitter Users in Non-English Contexts
While much work has considered the problem of latent attribute inference for users of social media such as Twitter, little has been done on non-English-based content and users. Here, we conduct the first assessment of latent attribute inference in languages beyond English, focusing on gender inference. We find that the gender inference problem in quite diverse languages can be addressed using e...
متن کاملItalian Political Communication and Gender Bias: Press Representations of Men/Women Presidents of the Houses of Parliament (1979, 1994, and 2013)
The study considers mass media communication as intertwined with social norms, as assumed by the perspective of social representations. It explores the Italian press communication by focusing on three pairs of men and women politicians with different political orientations and all serving as presidents of the Houses of Parliament in three legislatures. The article concentrates on five newspaper...
متن کاملSocial Media Writing and Social Class: A Correlational Analysis of Adolescent CMC and Social Background
In a large social media corpus (2.9 million tokens), we analyze Flemish adolescents’ non-standard writing practices and look for correlations with the teenagers’ social class. Three different aspects of adolescents’ social background are included: educational track, parental profession, and home language. Since the data reveal that these parameters are highly correlated, we combine them into on...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- IJKSR
دوره 5 شماره
صفحات -
تاریخ انتشار 2014